Spatial Bias in PM2.5 Monitoring Networks in Los Angeles County
DSAN 6750: GIS for Spatial Data Science
Introduction
Los Angeles County’s PM2.5 monitoring network is the backbone for estimating exposure, evaluating compliance, and targeting mitigation resources. Yet coverage remains uneven, especially relative to the sprawling freeways that channel both commuters and pollution. I focus on Los Angeles County because freeway-driven emissions are a dominant exposure pathway there, and any monitoring bias toward road corridors can meaningfully distort county-wide exposure assessments. This project set out to quantify how well the existing network samples space, whether monitors cluster along transportation corridors, and how those spatial patterns could bias our view of air quality. Framing the findings in a public-facing website gives local agencies and residents a transparent look at where data gaps remain.
Background & Research Questions
- RQ1. Are PM2.5 monitors more concentrated near major roads?
- RQ2. Do monitoring stations display spatial clustering beyond complete spatial randomness (CSR)?
Why it matters.
- Systematic spatial bias undermines regional exposure estimates.
- Road-adjacent siting can overstate traffic-related pollution while missing other sources.
- Spatial inequities have environmental justice implications across Los Angeles’s diverse neighborhoods.
Study area. Los Angeles County (study window area = 10,598.78 km², computed directly from the EPSG:3310 boundary) with 12 regulatory PM2.5 monitors retained after quality control; the boundary and covariates come from TIGER/Line and OpenStreetMap, respectively.
Literature Review
Two strands of scholarship motivate the analysis. First, environmental justice research shows that pollution burdens often fall on communities of color living near major roadways, while monitoring resources lag behind (e.g., Clark et al., 2017; Woo et al., 2019). Second, the spatial statistics literature warns that poorly placed monitors bias trend detection because kriging and exposure models inherit the first-order (intensity) and second-order (interaction) structure of the input points (Diggle, 2013). Taken together, these findings imply that siting monitors near roads could capture the highest emissions, but failing to cover inland valleys or the harbor might miss hotspots entirely. The regression and point-pattern diagnostics used here build directly on these theoretical insights.
Methodology
- Data assembly. Daily PM2.5 values (EPA AQS, 2016–2023) were joined to site metadata to obtain land use, location setting, and coordinates. TIGER/Line county polygons defined the study boundary, while OpenStreetMap supplied the major-road network.
- Spatial preprocessing. Sites inside Los Angeles County were averaged to a mean daily PM2.5 value per monitor (retaining only unique stations after filtering). All spatial layers were projected to EPSG:3310 to work in equal-area meters, saved as geopackages/shapefiles, and cached for reproducibility (
prepare_data.qmdandeda.qmd). - Point-pattern construction. The monitor centroids defined a planar point pattern object (
monitor_ppp), the county polygon became the observation window, and major roads were converted to a line-segment pattern to compute distance-based covariates (methods.qmd). - Statistical workflow. First-order structure was examined through kernel intensity surfaces and exploratory plots. Second-order structure (interaction between points) used pair-correlation and L-function envelopes with 999 simulations. Finally, an inhomogeneous Poisson regression tested whether monitor intensity decays with distance from major roads.
Exploratory Data Analysis (EDA)
Los Angeles County hosts a relatively small regulatory PM2.5 network, with an average concentration of 8.84 µg/m³ (sd = 3.42; roughly a 1–13 µg/m³ range). Most stations cluster in the San Gabriel Valley and South Bay, leaving the Antelope Valley and southeastern industrial belt sparsely monitored. The paired maps below show coverage and the spatial gradient in mean PM2.5.
Adding major roads clarifies how strongly siting decisions follow the transportation network: monitors hug the spine of the I-10/I-710 corridor and the South Bay freeway complex, while the northern desert and harbor periphery remain nearly blank. This figure mirrors the “Background” slide in the deck so the web page visually matches the presentation.
The same pattern is visible when we color each site by its distance to the nearest major road—warm colors sit directly on arterials, underscoring a design focused on traffic exposure.
Complementary scatterplots confirm that mean PM2.5 drops off as monitors move away from major routes, hinting that regressions incorporating the distance covariate should be significant.
Additional EDA revealed:
- Road proximity. The median monitor lies only 0.37 km from a major road (min 0.01 km; max 1.72 km), hinting at siting preferences tied to accessibility and emission sources.
- Land-use distribution. Urban land-use designations dominate the sample, meaning rural backgrounds and coastal zones are underrepresented (
eda.qmdprovides boxplots and LISA diagnostics). - Spatial dependence. Kernel-density surfaces show a ridge of monitors along the I-10/I-710 corridor, while LISA statistics highlight a High–High cluster south of downtown where elevated PM2.5 and neighboring monitors coincide.
Spatial Modeling Visuals
The figures below (rendered in EPSG:3310 meters) connect qualitative impressions with the formal statistical tests.
Key read-outs:
- Kernel vs. model agreement. The kernel smoother and the model-based intensity surface both peak over the freeway-heavy basin, indicating the distance covariate captures where monitors already cluster.
- Clustering beyond CSR. The observed L-function lies above the CSR envelope over a broad band of distances, indicating clustering beyond complete spatial randomness.
- Distance effect visualization. Together with the distance-to-road map and scatter plot, these figures visually reinforce the regression finding that intensity declines rapidly with distance from major corridors.
Hypothesis Testing (Regression)
To formalize the “near-road bias” hypothesis, I fit an inhomogeneous Poisson point-process model with monitor intensity as a log-linear function of distance to the nearest major road:
[ (s) = _0 + _1 (s) ]
where ( (s) ) is the expected monitor density at location ( s ). The fitted coefficients (methods.qmd) were:
- ( _0 = -13.32 ) (SE = 0.019, p < 0.001)
- ( _1 = -0.00111 ) per meter (SE = 2.5e-5, p < 0.001)
The estimated distance-to-road effect is negative and statistically significant, consistent with greater monitor density near major roads. Interpreting the coefficient suggests a steep decay in fitted intensity with distance from major corridors. Quadrat tests and Monte Carlo envelopes show no major lack-of-fit, suggesting the distance covariate captures a dominant first-order trend. However, residual second-order clustering persists because the network contains so few stations—limiting the power of purely spatial terms.
Key Findings
Question 1: Are monitors more concentrated near major roads?
✅ Yes—strong systematic placement bias.
- The estimated distance-to-road effect is negative and statistically significant, consistent with greater monitor density near major roads.
- Interpreting the coefficient suggests fitted intensity decays rapidly with distance from major corridors.
- The major-road overlay and distance maps above provide the visual counterpart to this statistical result.
Question 2: Do monitors show clustering beyond CSR?
✅ Yes—observable spatial aggregation.
- The observed L-function lies above the CSR envelope over a broad range of distances, indicating clustering beyond complete spatial randomness.
- Kernel intensity contours show the same clustering ridge, and the inhomogeneous model residuals remain spatially structured.
Implications.
- Systematic spatial bias in network design means many neighborhoods lack a local monitor.
- Road-proximate areas are over-monitored, potentially overemphasizing traffic pollution relative to industrial or port sources.
- Exposure assessments should correct for spatial sampling bias or augment the network (e.g., low-cost sensors) in underserved zones to address environmental justice concerns.
Together, these sections compile every figure and result from the data prep, EDA, and methods notebooks into a single landing page—no additional tabs required.
Conclusion
This project combined regulatory PM2.5 records, TIGER boundaries, and OSM road data to build a reproducible point-pattern analysis of Los Angeles’s monitoring network. Exploratory maps, kernel intensity estimates, and CSR envelope tests all confirmed a consistent story: only a dozen regulatory monitors cover a 10,598.78 km² county and most of them sit within a few hundred meters of the same freeway spine. The inhomogeneous Poisson regression quantified this bias by showing that fitted intensity declines sharply as distance from major roads increases, while the L-function indicated clustering well beyond random placement—evidence that siting decisions have been path dependent rather than coverage-driven.
From a policy and business perspective, the current PM2.5 network misrepresents where Angelenos actually live and breathe. Air district planners, consultants, and sustainability teams who rely on these monitors to estimate compliance or prioritize investments could easily mis-target resources: the network over-samples freeway corridors (where mitigation projects are already expensive) and under-samples inland warehouses, port communities, and rapidly growing suburbs. The quantitative evidence presented here—steep distance penalties in the Poisson model and clear clustering beyond CSR—provides a defensible case for reallocating capital toward new sensors in underserved neighborhoods. Investing in additional monitors or low-cost sensor deployments now would de-risk regulatory enforcement, improve environmental justice reporting, and supply more reliable data to businesses that must justify clean-air upgrades.